Nodejs crawler superagent, cheerio, and nodejssuperagent
Preface
I have heard of Crawlers for a long time. I started to learn nodejs over the past few days and wrote an article title, user name, number of readings, number of recommendations, and user profile on the homepage of the crawler blog garden. Now I have a small summary.
These points are used:
1. node core module-File System
2. Third-party module used for http requests-
GoalCreate a Lesson3 project in which to write code.When accessed in the browser http://localhost:3000/ , the output CNode (https://cnodejs.org/) Community home page of all post titles and links, in the form of JSONKnowledge Points:
Learn to crawl Web pages using superagent
Learn to use Cheerio Analysis Web pages
Library Introduction:Superagent (http://visionmedia.github.io/superagent/) is
1. Module use(1) HTTP request library in Superagent:nodejs (each language has countless, Java Okhttp,ios afnetworking)(2) HTML parsing Library in Cheerio:nodejs (basic for each language). )(3) Parallel/asynchronous concurrency function execution Library in Async:nodejs (this is very bull, other languages are not much the same type)2. Crawling contentMulti-play Hero League hero page, by parsing the URL of each hero within the page, and then request the hero's detailed data, extract the required d
' use strict '; ConstController= require(' Egg ').Controller; ConstRequest= require(' superagent '); classApicontrollerextendsController{Asyncfile(){ Const {CTx,App} = This; //egg built-in interface gets the uploaded file, reference address: ConstStream=AwaitCTX.GetFileStream(); ConstUrl= ' Http://demo/api/file '; //Attach need to pass 3 parameters, the official document said that the file name is
This article mainly introduced the Node.js [superAgent] Request Use example, has summarized the POST request, the GET request, the delete request and the PUT request example, recommends to everybody, hoped that everybody can like.
POST request:
The code is as follows:Request.post ('/api/pet '). End (function (Resp,err) {if (resp.body.status===200) {Alert (' Yay got ' + json.stringify (res.body));} else {Return next (Resp.body);}});
GET Request:
This article mainly introduces node. the js [superAgent] request usage example summarizes the post request, get request, delete request, and put request examples, and recommends them to you. Post request:
The code is as follows:
Request. post ('/api/pet '). End (function (resp, err ){If (resp. body. status === 200 ){Alert ('Yay got '+ JSON. stringify (res. body ));} Else {Return next (resp. body );}});
Get request:
The code is as follows
article has introduced, so here no longer do the introduction: Click to view
2, need to install the new things:
Superagent
Role: Similar to request, we can use it to obtain get/post and other requests, and can set the relevant request header information, compared to the use of built-in modules, much simpler.
Usage:
var superagent = require (' superagent
This article is reproduced from:How to use Nodejs to analyze a simple page----------Han Zi later in this articleIn the Browser address field, enter the localhost:3000 20 article title on the page to display the blog home.Process AnalysisThe first step is to listen to the port, which requires the introduction of one of the most important modules in Node express . Next, we need to send a HTTP-like request to the http://www.cnblogs.com/page to get the page data for analysis, which requires the intr
This article mainly introduces the whole process of making crawlers in NodeJS, including project establishment, target website analysis, use superagent to obtain source data, use cheerio to parse, and use eventproxy to concurrently capture the content of each topic. For more information, see. I am going to learn about the crawler tutorial of alsotang today, and then I will simply crawl the CNode.
Create Project craelr-demoFirst, create an Express pro
Recently in a bookstore project, Data Crawler crawling, Baidu a bit to find this site, in order to choose the day to remember this novel as an example.The crawler used several modules, Cheerio,superagent,async.Superagent is an HTTP request module, details can be found in the link.Cheerio is a document parsing module with jquery-like syntax that you can simply interpret as jquery in Nodejs.Async is an asynchronous Process Control module, where we mainl
Nodejs crawl data problems with encoding errorsCan be processed using the Superagent-charset and superagent modulesvar charset = require (' Superagent-charset '); var cheerio = require (' Cheerio '); var superagent = require (' superagent ') ; CharSet (
The SQL injection of a website in the chain home is not fixed, so getshell can penetrate through the Intranet.
Getshell is caused by unrepaired SQL Injection on a website in the chain home, which can penetrate into the Intranet. A large number of employee accounts can be viewed !!!!!!!!
URL: http://tc.homelink.com.cn/,
#1 getshell and get the highest server PermissionsThe account (jfdhb/jfdhb123456) obtained by the last shell (which has been in existence for several days and has not been modifi
Today to learn Alsotang's reptile tutorial, follow the Cnode simple climb again.
Establish Project Craelr-demoWe first set up a express project and then delete all the App.js file contents, because we don't need to present the content on the web side for the time being. Of course we can also npm install express use the Express function we need directly under the empty folder.
Target site AnalysisAs pictured, this is the Cnode home part of the DIV tag, we are through this series of IDs, class t
. If you do not want to set a dead IP address in advance, you can also select sn verification, which uses md5 as the encryption algorithm's verification method.At the beginning, I chose the sn for verification. However, I can only use the ip address whitelist for verification when calling crypto to generate the md5 Signature.
2. query by nodejs
With the interface for calling, we can write a small script to request data. We need three dependencies: express, s
the file is in a subdirectory, such as Src/page/user/pay.vue, then the corresponding scss file is src/style/scss/user/_pay.scss.
Each team's specifications are not the same, are the strengths of each other, it is important that the organization.Call Api.jsIn the second section, we set up a api.js empty file under the Src/config directory. Not used in the third section. In this section, we're going to start using it.First, we edit src/main.js, referencing Src/config/api.js. As follows:Impor
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.